Non-exponential Reward Discounting in Reinforcement Learning

نویسندگان

چکیده

Reinforcement learning methods typically discount future rewards using an exponential scheme to achieve theoretical convergence guarantees. Studies from neuroscience, psychology, and economics suggest that human animal behavior is better captured by the hyperbolic discounting model. Hyperbolic has recently been studied in deep reinforcement shown promising results. However, this area of research seemingly understudied, with most extant continuing standard formulation. My dissertation examines effects non-exponential functions (such as hyperbolic) on agent's aims investigate their impact multi-agent systems generalization tasks. A key objective study link rate approximation underlying hazard its environment through survival analysis.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance

Several studies have demonstrated that reward from a human trainer can be a powerful feedback signal for control-learning algorithms. However, the space of algorithms for learning from such human reward has hitherto not been explored systematically. Using model-based reinforcement learning from human reward, this article investigates the problem of learning from human reward through six experim...

متن کامل

Maximum reward reinforcement learning: A non-cumulative reward criterion

Existing reinforcement learning paradigms proposed in the literature are guided by two performance criteria; namely: the expected cumulativereward, and the average reward criteria. Both of these criteria assume an inherently present cumulative or additivity of the rewards. However, such inherent cumulative of the rewards is not a definite necessity in some contexts. Two possible scenarios are p...

متن کامل

Reward, Motivation, and Reinforcement Learning

There is substantial evidence that dopamine is involved in reward learning and appetitive conditioning. However, the major reinforcement learning-based theoretical models of classical conditioning (crudely, prediction learning) are actually based on rules designed to explain instrumental conditioning (action learning). Extensive anatomical, pharmacological, and psychological data, particularly ...

متن کامل

Compatible Reward Inverse Reinforcement Learning

PROBLEM • Inverse Reinforcement Learning (IRL) problem: recover a reward function explaining a set of expert’s demonstrations. • Advantages of IRL over Behavioral Cloning (BC): – Transferability of the reward. • Issues with some IRL methods: – How to build the features for the reward function? – How to select a reward function among all the optimal ones? – What if no access to the environment? ...

متن کامل

An Average - Reward Reinforcement Learning

Recently, there has been growing interest in average-reward reinforcement learning (ARL), an undiscounted optimality framework that is applicable to many diierent control tasks. ARL seeks to compute gain-optimal control policies that maximize the expected payoo per step. However, gain-optimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i13.26916